Discussion: Local Rademacher Complexities and Oracle Inequalities in Risk Minimization1 by A. B. Tsybakov
نویسنده
چکیده
The paper of Vladimir Koltchinskii has been circulating around for several years and already has become an important reference in statistical learning theory. One of the main achievements of the paper (further abbreviated as [VK]) is to propose very general techniques of proving oracle inequalities for excess risk under a control of the variance, that is, for example, under conditions (6.1) or (6.2) (often called margin or low noise conditions) or similar assumptions in terms of L2-diameters DP (F , δ) and other related characteristics. These conditions lead to fast rates for the excess risk, that is, to rates that are faster than n−1/2. The setup in [VK] is classical: methods based on empirical risk minimizers (ERM) f̂n are studied under the bounded loss functions. My comments and questions will be mainly about optimality of the excess risk bounds. This issue is not at all obvious, even in the case where the underlying class F is finite. We assume in what follows that either F = {f1, . . . , fM}, where fj are some functions on S, or this class is a convex hull F = conv{f1, . . . , fM}. Such classes F are used in aggregation problems where the functions fj are viewed either as “weak learners” or as some preliminary estimators constructed from a training sample which is considered as frozen in further analysis. Let Z1, . . . ,Zn be i.i.d. random variables taking values in a space Z, with common distribution P , and denote by F0 the space where the fj live. Consider a loss function Q :Z×F0 →R and the associated risk R(f )= EQ(Z,f ) assuming that the expectation EQ(Z,f ) is finite for all f ∈ F0 where Z has the same distribution as Zi . Introduce two oracle risks: RMS = min1≤j≤M R(fj ) corresponding to model selection-type aggregation (MS-aggregation), and RC = inff∈conv{f1,...,fM }R(f ) corresponding to convex aggregation (C-aggregation). The excess risk of a statistic f̃n(Z1, . . . ,Zn) is defined by E(f̃n)= E{R(f̃n)} −ROR, where the oracle risk ROR equals either RMS or RC. A natural question about optimality is how to find an estimator f̃n for which the excess risk is as small
منابع مشابه
Discussion of “2004 Ims Medallion Lecture: Local Rademacher Complexities and Oracle Inequalities in Risk Minimization” by v. Koltchinskii
1. Introduction. This paper unifies and extends important theoretical results on empirical risk minimization and model selection. It makes extensive and efficient use of new probability inequalities for the amount of concentration of the (possibly symmetrized) empirical process around its mean. The results are very subtle and very pleasing indeed, as they show that oracle inequalities exist for...
متن کاملRejoinder: 2004 Ims Medallion Lecture: Local Rademacher Complexities and Oracle Inequalities in Risk Minimization
of the true risk function F ∋ f 7→ Pf. The first quantity of interest is the L2-diameter of this set, D(F ; δ), and the second one is the function φn(F ; δ) that is equal to the expected supremum of empirical process indexed by the differences f − g, f, g ∈F(δ). These two functions are then combined in the expression Ūn(δ; t) that has its roots in Talagrand’s concentration inequalities for empi...
متن کاملDiscussion of “2004 Ims Medallion Lecture: Local Rademacher Complexities and Oracle Inequalities in Risk Minimization” by v. Koltchinskii
Koltchinskii is to be congratulated for developing a unified framework. This elegant framework is general and allows a user to apply it directly instead of deriving bounds in each risk minimization problem. In the past decade, the problem of risk minimization has been extensively studied in function estimation and classification. In function estimation, it has been investigated using the empiri...
متن کاملMedallion Lecture Local Rademacher Complexities and Oracle Inequalities in Risk Minimization
Let F be a class of measurable functions f :S 7→ [0,1] defined on a probability space (S,A, P ). Given a sample (X1, . . . ,Xn) of i.i.d. random variables taking values in S with common distribution P , let Pn denote the empirical measure based on (X1, . . . ,Xn). We study an empirical risk minimization problem Pnf →min, f ∈ F . Given a solution f̂n of this problem, the goal is to obtain very ge...
متن کاملDiscussion of “2004 Ims Medallion Lecture: Local Rademacher Complexities and Oracle Inequalities in Risk Minimization” by v. Koltchinskii
In this magnificent paper, Professor Koltchinskii offers general and powerful performance bounds for empirical risk minimization, a fundamental principle of statistical learning theory. Since the elegant pioneering work of Vapnik and Chervonenkis in the early 1970s, various such bounds have been known that relate the performance of empirical risk minimizers to combinatorial and geometrical feat...
متن کامل